Tallinn
- Europe > Ukraine > Kyiv Oblast > Kyiv (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- (96 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education > Health & Safety > School Nutrition (0.93)
- Health & Medicine > Consumer Health (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)
Supplementary material - ABCFair: an Adaptable Benchmark approach for Comparing Fairness Methods
We used the sex and the education of the student's parents as the sensitive attributes for this dataset. We removed all features that are other expressions of the labels (i.e. Note that this is the only folktables dataset on which we report results in the main paper. Sex, age, and rage are used as sensitive features for this datasets. We deem these features as not relevant for this use case.
- Europe > France (0.05)
- Europe > Estonia > Harju County > Tallinn (0.05)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Bristol (0.04)
- (3 more...)
- Information Technology (0.46)
- Education (0.46)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (2 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Europe > France (0.05)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- (5 more...)
What Is Claude? Anthropic Doesn't Know, Either
Researchers at the company are trying to understand their A.I. system's mind--examining its neurons, running it through psychology experiments, and putting it on the therapy couch. It has become increasingly clear that Claude's selfhood, much like our own, is a matter of both neurons and narratives. A large language model is nothing more than a monumental pile of small numbers. It converts words into numbers, runs those numbers through a numerical pinball game, and turns the resulting numbers back into words. Similar piles are part of the furniture of everyday life. Meteorologists use them to predict the weather. Epidemiologists use them to predict the paths of diseases. Among regular people, they do not usually inspire intense feelings. But when these A.I. systems began to predict the path of a sentence--that is, to talk--the reaction was widespread delirium. As a cognitive scientist wrote recently, "For hurricanes or pandemics, this is as rigorous as science gets; for sequences of words, everyone seems to lose their mind." It's hard to blame them. Language is, or rather was, our special thing. We weren't prepared for the arrival of talking machines. Ellie Pavlick, a computer scientist at Brown, has drawn up a taxonomy of our most common responses. There are the "fanboys," who man the hype wires. They believe that large language models are intelligent, maybe even conscious, and prophesy that, before long, they will become superintelligent. The venture capitalist Marc Andreessen has described A.I. as "our alchemy, our Philosopher's Stone--we are literally making sand think." The fanboys' deflationary counterparts are the "curmudgeons," who claim that there's no there, and that only a blockhead would mistake a parlor trick for the soul of the new machine. In the recent book " The AI Con," the linguist Emily Bender and the sociologist Alex Hanna belittle L.L.M.s as "mathy maths," "stochastic parrots," and "a racist pile of linear algebra." But, Pavlick writes, "there is another way to react." It is O.K., she offers, "to not know." What Pavlick means, on the most basic level, is that large language models are black boxes. We don't really understand how they work. We don't know if it makes sense to call them intelligent, or if it will ever make sense to call them conscious. The existence of talking machines--entities that can do many of the things that only we have ever been able to do--throws a lot of other things into question. We refer to our own minds as if they weren't also black boxes.
- South America > Colombia (0.14)
- Asia > Russia (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (5 more...)
- Transportation (1.00)
- Leisure & Entertainment > Games (1.00)
- Law (1.00)
- (6 more...)
- Europe > Germany > Lower Saxony > Hanover (0.05)
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (7 more...)
- Asia > Afghanistan (0.14)
- Europe > Finland > Uusimaa > Helsinki (0.05)
- Asia > Middle East > Israel (0.05)
- (19 more...)
- Leisure & Entertainment > Sports > Football (1.00)
- Law Enforcement & Public Safety > Terrorism (0.68)
- Government > Regional Government > North America Government > United States Government (0.68)
Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages
Samuel, David, Øvrelid, Lilja, Velldal, Erik, Kutuzov, Andrey
We propose a post-training method for lower-resource languages that preserves fluency of language models even when aligned by disfluent reward models. Preference-optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and language models capable of generating fluent synthetic data. Thus, in this work, we focus on developing a fluent preference-aligned language model without any instruction-tuning data in the target language. Our approach uses an on-policy training method, which we compare with two common approaches: supervised finetuning on machine-translated data and multilingual finetuning. We conduct a case study on Norwegian Bokmål and evaluate fluency through native-speaker assessments. The results show that the on-policy aspect is crucial and outperforms the alternatives without relying on any hard-to-obtain data.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (22 more...)
- Media > Music (0.50)
- Leisure & Entertainment (0.50)
PPTArena: A Benchmark for Agentic PowerPoint Editing
Ofengenden, Michael, Man, Yunze, Pang, Ziqi, Wang, Yu-Xiong
W e introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2,125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-truth deck, a fully specified target outcome, and a dual VLM-as-judge pipeline that separately scores instruction following and visual quality using both structural diffs and slide images. Building on this setting, we propose PPTPilot, a structure-aware slide-editing agent that plans semantic edit sequences, routes between high-level programmatic tools and deterministic XML operations for precise control, and verifies outputs through an iterative plan-edit-check loop against task-specific constraints. In our experiments, PPTPilot outperforms strong proprietary agents and frontier VLM systems by over 10 percentage points on compound, layout-sensitive, and cross-slide edits, with particularly large gains in visual fidelity and deck-wide consistency. Despite these improvements, existing agents still underperform on long-horizon, document-scale tasks in PPTArena, highlighting the remaining challenges in reliable PPT editing.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- South America > Peru > Loreto Department (0.04)
- (4 more...)